# Open-domain visual understanding
Vit Base Patch16 Clip 224.datacompxl
Apache-2.0
A vision Transformer model based on the CLIP architecture, specifically designed for image feature extraction, using ViT-B/16 structure and trained on the DataComp XL dataset
Image Classification
Transformers

V
timm
36
0
Clip Vit Large Patch14
CLIP is a vision-language model developed by OpenAI that maps images and text into a shared embedding space through contrastive learning, supporting zero-shot image classification.
Image-to-Text
C
openai
44.7M
1,710
Featured Recommended AI Models